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Abstract — We study the capacity of heterogeneous distributed 
storage systems under repair dynamics. Examples of these sys- 
tems include peer-to-peer storage clouds, wireless, and Internet 
caching systems. Nodes in a heterogeneous system can have 
different storage capacities and different repair bandwidths. We 
give lower and upper bounds on the system capacity. These 
bounds depend on either the average resources per node, or 
on a detailed knowledge of the node characteristics. Moreover, 
we study the case in which nodes may be compromised by an 
eavesdropper, and give bounds on the system secrecy capacity. 
One implication of our results is that symmetric repair maximizes 
the capacity of a homogeneous system, which justifies the model 
widely used in the literature. 

I. Introduction 

Cloud storage has emerged in recent years as an inexpensive 
and scalable solution for storing large amounts of data and 
making it pervasively available to users. The growing success 
of cloud storage has been accompanied by new advances in 
the theory of such systems, namely the application of network 
coding techniques for distributed data storage and the theory of 
regenerating codes introduced by Dimakis et ah Q, followed 
by a large body of further work in the literature. 

Cloud storage systems are typically built using a large 
number of inexpensive commodity disks that fail frequently, 
making failures "the norm rather then the exception" Q. 
Therefore, it is a prime concern to achieve fault-tolerance 
in these systems and minimize the probability of losing the 
stored data. The recent theoretical results uncovered funda- 
mental tradeoffs among system resources (storage capacity, 
repair bandwidth, etc.) that are necessary to achieve fault- 
tolerance. They also provided novel code constructions for data 
redundancy schemes that can achieve these tradeoffs in certain 
cases; see for example 0, (D and 0. 

The majority of the results in the literature of this field 
focus on a homogeneous model when studying the information 
theoretic limits on the performance of distributed storage 
systems. In a homogeneous system all the nodes (hard disks 
or other storage devices) have the same parameters (storage 
capacity, repair bandwidth, etc.). This model encompasses 
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many real-world storage systems such as clusters in a data 
center, and has been instrumental in forming the engineering 
intuition for understanding these systems. Recent development 
have included the emergence of heterogeneous systems that 
pool together nodes from different sources and with different 
characteristics to form one big reliable cloud storage system. 
Examples include peer-to-peer (p2p), or hybrid (p2p-assisted) 
cloud storage systems (6), Q, Internet caching systems for 
video-on-demand applications 0, (9), and caching systems in 
heterogeneous wireless networks 1 10 ]. Motivated by these ap- 
plications, we study the capacity of heterogeneous distributed 
storage systems (DSS) here under reliability and secrecy 
constraints. 

Contributions: The capacity of a DSS is defined as the 
maximum amount of information that can be delivered to any 
user contacting k out of n nodes in the system. Intuitively, in 
a heterogeneous system, this capacity should be limited by the 
"weakest" nodes. However, nodes can have different storage 
capacities and different repair bandwidths. And the tension 
between these two set of parameters makes it challenging to 
identify which nodes are the "weakest". 

Our first result establishes an upper bound on the capacity 
of a DSS that depends on the average resources in the system 
(average storage capacity and average repair bandwidth per 
node). We use this bound to prove that symmetric repair, i.e., 
downloading equal amount of data from each helper node, 
maximizes the capacity of a homogeneous DSS. While the 
optimality of symmetric repair is known for the special case 
of MDS codes [11], our results assert that symmetric repair 
is always optimal for any choice of system parameters. Fur- 
ther, our proof avoids the combinatorial cut-based arguments 
typically used this context. 

In addition, we give an expression for the capacity when we 
know the characteristics of all the nodes in the system (not just 
the averages). This expression may be hard to compute, but we 
use it to derive additional bounds that are easy to evaluate. Our 
techniques generalize to the scenario in which the system is 
compromised by an eavesdropped We give bounds on the 
secrecy capacity when the system is supposed to leak no 
information to the eavesdropper (perfect secrecy). Here too, 
we show that symmetric repair maximizes the secrecy capacity 
of a homogeneous system. 

! Our results also generalize to the case of a malicious adversary who can 
corrupt the stored data. This model will be included in the extended version 
of this paper. 



Related work: Wu proved the optimality of symmetric 
repair in lITTIl for the special case of a DSS using Maximum 
Distance Separable (MDS) codes. Coding schemes for a non- 
homogeneous storage system with one super-node that is 
more reliable and has more storage capacity were studied in 
lfT2l . References fT3ll and ifTH studied the problem of storage 
allocations in distributed systems under a total storage budget 
constraint. Pawar et al. ifHlL fT6l studied the secure capacity of 
distributed storage systems under eavesdropping and malicious 
attacks. 

Organization: Our paper is organized as follows. In 
Section [TTJ we describe our model for heterogeneous DSS and 
set up the notation. In Section [Till we summarize our main 



results. In Section IV we prove our bounds on the capacity 
of a heterogeneous DSS. In Section |V| we study the secrecy 
capacity in the presence of an eavesdropper. We conclude in 
Section [VI] and discuss some open problems. We postpone 
some of the proofs to the Appendix, where we also discuss the 
generalizability of our results from functional to exact repair. 

II. Model 

A heterogeneous distributed storage system is formed of 
n storage nodes vi, . . . , v n with storage capacities ai, . . . , a n 
respectively. Unless stated otherwise, we assume that the nodes 
are indexed in increasing order of capacity, i.e., ql\ < a 2 < 
• • • < a n . In a homogeneous system all nodes have the 
same storage capacity a, i.e., oli = a,Vi. As a reliability 
requirement, a user should be able to obtain a file by contacting 
any k < n nodes in the DSS. The nodes forming the system 
are unreliable and can fail. The system is repaired from a 
failure by replacing the failed node with a new node. Upon 
joining the system, the new node downloads its data from d, 
k < d < n — 1, helper nodes in the system. 

The repair process can be either exact or functional. In the 
case of exact repair, the new node is required to store an exact 
copy of the data that was stored on the failed node. Whereas 
in the case of functional repair, the data stored on the new 
node does not have to be an exact copy of the lost data, but 
merely "functionally equivalent" in the sense that it preserves 
the property that contacting any k out of n nodes is sufficient 
to reconstruct a stored file. We focus on functional repair in 
this paper, although some of our results do generalize to the 
exact repair model (see the discussion in Appendix [A|. 

An important system parameter is the repair bandwidth 
which refers to the total amount of data downloaded by the 
new node. In a homogeneous system, the repair bandwidth, 
denoted by 7, is the same for any new node joining the system. 
The typical model adopted in the literature assumes symmetric 
repair in which the total repair bandwidth 7 is divided equally 
among the d helpers. Thus, the new node downloads /3 = j/d 
amount of information from each helper. In a heterogeneous 
system the repair bandwidth can vary depending on which 
node has failed and which nodes are helping in the repair 
process. We denote by fajs the amount of information that a 
new node replacing the failed node Vj is downloading from 
helper node V{ when the other helper node belong to the index 



set S (ieS, \S\ = d). An important special case is when the 
repair bandwidth per helper depends only on the identity of 
the helper node and not on the identity of the failed node 
or the other helpers. In this case, we say that helper node V{ 
has repair bandwidth i.e., fiijs = ft,Vj, S. In the case 
of a homogeneous system with symmetric repair, we have 

p ij s = P = -y/d,Vi,j,S. 

We focus on repair from single node failure^] In this case, 
there are ( n ^ 1 ) possibilities for the set of helpers S. Therefore, 
the average repair bandwidth 7^ of node Vj is 
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We denote by 7 = ± YTj=i Ij and a = \ YJj=i a j the 
average total repair bandwidth and average node capacity in 
the DSS, respectively. 

We are interested in finding the capacity C of a hetero- 
geneous system. The capacity C represents the maximum 
amount of information that can be downloaded by any user 
contacting k out of the n nodes in the system. Recall from (H, 
that the capacity C ho of a homogeneous system implementing 
symmetric repair is given by 



C ho (a, 7) = E min (d - i + 1) J} . 



(2) 



We are also interested in characterizing the secrecy capacity 
of the system when some nodes are compromised by an 
eavesdropper. We follow the model in [1131 and fT6l and 
denote by i, £ < k, the number of compromised nodes. The 
eavesdropper is assumed to be passive. She can read the data 
downloaded during repair and stored on a compromised node. 
We are interested here in information theoretic secrecy which 
characterizes the fundamental ability of the system to provide 
data confidentiality independently of cryptographic methods. 
The secrecy capacity of the system, denoted by C s , is defined 
as the maximum amount of information that can be delivered 
to a user without revealing any information to the eavesdropper 
(perfect secrecy). We denote by the secrecy capacity of 
a homogeneous system with symmetric repair. Finding is 
still an open problem in general. The following upper bound 
was shown to hold in [15] and lfT6l : 



k 

C h s °(a^,l)< £ mm{«,(d-i + l)^ 
III. Main Results 



(3) 



We start by summarizing our results. Theorem [T] gives a 
general upper bound on the storage capacity of a heteroge- 
neous DSS as a function of the average resources per node. 

2 Multiple failures can be repaired independently as long as there are at 
least d helper nodes in the system. For another model of repair that assumes 
cooperation when repairing multiple failures in homogeneous systems, refer 
to (17) and (H). 



File: (x,y,z) 



File: (x ij y lJ z l ) J i = 1, ... ,6 



Xi + Z-y 


V2,Z 2 


X3 + z 3 




2/5,^5 


^6,2/6 





£2,2/2 


2/3,^3 


2/4,^4 


x 5 + z 5 


+ z 6 



u3| 2/1 > 21 ^2 + ^2 £3,2/3 ^4 + ^4 £5,2/5 2/6,^6' 



(a) 



a = 10 
(b) 



Fig. 1. An example that illustrates the proof of the upper bound fH) on the capacity of a heterogeneous system, (a) A heterogeneous distributed storage system 
(DSS) with (n, k, d) = (3, 2, 2). The nodes have storage capacities ct\ — 1, 012 — — 2 and the repair bandwidth per helper are f3\ = 1, ^2 = ft = 2. 
(b) A DSS constructed by combining together n\ = 6 copies of the original heterogeneous system corresponding to all possible node permutations. The 
obtained DSS is homogeneous with uniform storage per node a = 10 and repair bandwidth per helper (3 = 10. The capacity of this system is 20 as given 
by {2} |T]j. Any code that stores a file of size C (C = 3 here) on the original DSS can be transformed into a scheme that stores a file of size n\C = 6C in 
the "bigger" system. This gives the upper bound in (H) C < 20/6 = 10/3. 



Theorem 1: The capacity C of a heterogeneous distributed 
storage system, with node average capacity a and average 
repair bandwidth 7, is upper bounded by 



where 
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The right-hand side term in ( [T4| ) is the capacity of a 
homogeneous system in §2§ in which all nodes have storage 
a = a and total repair bandwidth 7 = 7. Th. [T] states that 
the capacity of a DSS cannot exceed that of a homogeneous 
system where the total system resources are split equally 
among all the nodes. Moreover, Th. [T] implies that symmetric 
repair is optimal in homogeneous systems in the sense that 
it maximizes the system capacity. This justifies the repair 
model adopted in the literature. This result is stated formally 
in Cor. [2] 

While the optimality of symmetric repair is known for 
the special case of MDS codes (TH, Cor. [2] asserts that 
symmetric repair is always optimal for any choice of system 
parameters. This result follows directly from Th. 1 and avoids 
the combinatorial cut-based arguments that may be needed in 
a more direct proof. 

Corollary 2: In a homogeneous DSS with node capacity a 
and total repair bandwidth 7, symmetric repair maximizes the 
system capacity. 

When we know the parameters of the nodes in the system 
beyond the averages, we can obtain possibly tighter bounds as 
described in Th. [3] To simplify the notation, let us order the 
repair bandwidth per helper fiijs into an increasing sequence 
P[ , P f 2 , • • • , P' m , such that f3[ < j3[ +1 and where m = nd( n ^ 1 ) . 
Also, recall that ol\ < 012 < ■ ■ ■ < a n . 

Theorem 3: The capacity C of heterogeneous DSS is 
bounded by 

Cmin — C — Cmax 



C max = ;= min k ( a i + E #> 
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When the system is compromised by an eavesdropper the 
system secrecy capacity can be upper bounded as follows. 

Theorem 4: The secrecy capacity C s of a DSS when i 
nodes in the system are compromised by an eavesdropper is 
upper bounded by 



C s < min < a, (d — i + 1) ^ 



=£+1 



(5) 



This theorem implies that symmetric repair also maximizes 
the secrecy capacity of a homogeneous DSS. 

IV. Capacity of Heterogeneous DSS 

A. Example & Proof of Theorem [7] 

We illustrate the proof of Th. [T] through an example for the 
special case in which the bandwidths depend only on identity 
of the helper node. We compute the capacity of the DSS for 
this specific example, and show that it is strictly less than the 
upper bound of Th. [T] That is, it does not achieve the capacity 
of a homogenous system with the same average characteristics. 
More specifically, consider the heterogeneous DSS depicted in 
Fig. [TJa) with (n,k,d) = (3,2,2) formed of 3 storage nodes 
vi,V2 and ^3 with storage capacities (ai, 0^2,^3) = (1,2,2) 
and repair bandwidths (Pi, P2, P3) = (1,2,2). The average 
node capacity a = 5/3 and repair bandwidth are f3 = 10/3. 
Th.[l] gives that the capacity of this DSS C < 10/3 = 3.33. 



For this example, it is easy to see that the DSS capacity 
is C = 3 < 10/3. In fact, a user contacting nodes v\ and 
V2 cannot download more information then their total storage 
oli + ot2 = 3. This upper bound is achieved by the code in 
Fig. [TJa). The code stores a file of 3 units (x,y,z) in the 
system. During repair the new node downloads the whole 
file and stores the lost piece of the data (note that the repair 
bandwidth constraints allow this trivial repair). 

To obtain the upper bound in ([14]), we use the original 
heterogeneous DSS to construct a "bigger" homogeneous 
system. We obtain this new system by "glueing" together 
n\ = 3! = 6 copies of the original DSS as shown in Fig.[TJb). 
Each copy corresponds to a different permutation of the nodes. 
In the figure, the i th copy stores the file {x^y^Zi). For 
example in Fig.[TJb), the first copy is the original system itself, 
the second corresponds to node v\ and node v% switching 
positions, and so on. 

The "bigger" system is homogeneous because all its nodes 
have storage a = 10 and repair bandwidth per helper f3 = 
7/d = 10. The capacity C of this system can be computed 
from 0: 



k 

C ' = min i a i (d - z + 1) ^} = 20. 



(6) 



As seen in Fig. [T] any scheme that can store a file of size 
C in the original DSS can be transformed into a scheme that 
can store a file of size n\C in the "bigger" DSS. Therefore, 
we get n\C < C and C <= 10/3. This argument can be 
directly generalized to arbitrary heterogeneous systems. The 
general proof follows the same steps explained above and can 
be found in Appendix [B| 

Theorem [T] implies that symmetric repair, i.e., downloading 
equal numbers of bits from each of the helpers, is optimal in a 
homogeneous system. To see this, consider a DSS with node 
storage capacity a, and a total repair bandwidth budget 7. A 
new node joining the system has the flexibility to arbitrarily 
split its repair bandwidth among the d helpers as long as the 
total amount of downloaded information does not exceed 7. In 
other words, we have ^Z ieS fajs = 7, Vj, S. Now, irrespective 
of how each new node splits its bandwidth budget, the average 
repair bandwidth in the system is the same, 7 = 7. If we apply 
Th.[T] we get an upper bound that matches exactly the capacity 
in ([2]) of a homogeneous DSS with symmetric repair. Hence, 
we obtain the result in Cor. |2] 

B. Proof of Theorem [J] 

To avoid heavy notation, we focus on the case in which the 
repair bandwidth depends only on the helper node (fajs = fa)- 
We give in Th. [5] lower and upper bounds specific to this 
case. These bounds are similar to the ones in Th. |3j but can 
be tighter. The proof of Th. [3] follows the exact steps of the 
proof below and will be omitted here. Again, we assume that 
the nodes are indexed in increasing order of node capacity, 
ol\ < ot2 < • • • < <^n- We also order the values of the repair 
bandwidths (j to obtain the increasing sequence f3[ < fa 2 < 

•••< V 



Theorem 5: The capacity C of a heterogeneous DSS, in 
which the repair bandwidth depends only on the identity of 
the helper node, is bounded as C^ in < C < C^ ax , where 



C^ in = mm (^> # + $ + ■ ■ ■ + P'd-i+l) 




(7) 



and 



Cmax = Yl min (°^ + $+2 + ' ' ' + P'd+l) 
i=l 

(I k-l d+1 \ 

i=l 3 = 1 i=l+l+j / 



(8) 



The second expressions for C^ in and C^ ax highlight the 
analogy with the bounds in Th. [3] Before proving Th. [5] we 
give a couple of illustrative examples and discuss some special 
cases. 

Example 6: Consider again the example in the previous 
section where (n, fc, d) = (3, 2, 2) and where the nodes param- 
eters are (ai,/3i) = (1,1), (a 2 ,/?2) = (^3,^3) = (2, 2). Here, 
Cmin = 2 anc * Cmax = 3. Note that here C^ ax is tighter then 
the average-based upper bound of Th.[T] which gives C < 3.33. 
Recall that the capacity for this system is C = 3 = C^ ax . 

Example 7: Consider now a second DSS with (n, fc, d) = 
(3,2,2) and (c^ft) = (5, 3), (a 2 , h) = (6,4) and 
(a 3 , ft) = (7, 5). Here, C' min = 9 and C^ ax = 11, and Th. [T] 
gives C < 10 < C^ ax . 

The upper and lower bounds can coincide (C^ in = C^ ax ) in 
certain cases, which gives the exact expression of the capacity. 
For example: 

1) A homogeneous DSS, where we recover the capacity 
expression in ([2]). 

2) A DSS with uniform repair bandwidth, i.e., fa = fa Vi. 
The capacity is C = Yl,i=i mm (<^i? {d — i + 1)0). 

3) Whenever < /3[,Vi. In this case the capacity C = 

Ek 
i=i a i- 

To prove the upper and lower bounds in Th. |5j we first 
establish the following expression of the DSS capacity. 

Theorem 8: The capacity C of a heterogeneous DSS is 
given by 
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where for any S C {1, ... , n}, /3 S = ^ ieS fa. 

The proof of Th. [8] is a generalization of the proof in Q 
of the capacity of a homogeneous system ([2]). We defer this 
proof to Appendix [C] and explain here the intuition behind 
it. Consider the scenario depicted in Fig. [2] where nodes 
Vf x , . . . , Vf k fail and are repaired successively such that node 
Vf. is repaired by downloading data from the previously 




Fig. 2. A series of k failures and repairs in the DSS that explains the capacity 
expression in j9j. Nodes Vf x , . . . , Vf k fail successively and are repaired as 
depicted above. The amount of "new" information that node v f. can give the 
user is the minimum between his storage capacity ctf. and downloaded data 



repaired nodes Vf 1 , . . . , Vf ._ 1 and d — (i — 1) other helper 
nodes in the system. Consider now a user contacting nodes 

The amount of "non-redundant" information that node Vf. 
can give to the user is evidently limited by its storage capacity 
OLi on one hand, and on the other hand, by the amount of 
information downloaded from the d — i + 1 helper nodes 
that are not connected to the user. Minimizing over all the 
choices of /i, . . . , /& gives the expression in ([9]). 

It is not clear whether the capacity expression in ^ can be 
computed efficiently. For this reason we give upper and lower 
bounds that are easy to compute. To get the lower bound in 
0, let (/i, . . . , f k ) = {fl, ...,/*) be the minimizer of (|9j. 
We have 

/ 



C= >^mm ccf*, min 3s- 
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where Z*, < Z* < & is the number of those cases where af* 
is smaller or equal than the corresponding sum of /3's. 

The upper bound C^ ax is obtained by taking (/i , . . . , /&) = 
(1, . . . , k) in ^ and following similar steps as above. 

V. Security 

A. Secrecy Capacity 

We now consider the case in which £ nodes in the system 
are compromised by a passive eavesdropper who can observe 
their downloaded and stored data, but cannot alter it. The 
secrecy capacity C s of the system is the maximum amount of 



information that can be delivered to any user without revealing 
any information to the eavesdropper (perfect secrecy). 

Formally, let S be the information source that represents 
the file that is stored on the DSS. A user contacts the nodes 
in any set B C {i>i, . . . , v n } of size k and downloads their 
stored data denoted by C#. The user should be able to decode 
the file, which implies H(S\C B ) = 0. Let E be the set of 
the t compromised nodes, and De be the data observed by 
the eavesdropper. The perfect secrecy condition implies that 
H(S\De) = H(S). Following the definition in (T6J, we write 
the secrecy capacity as 



C>, 7 )= sup H(S). 

h(s\c b )=ovb 

H(S\D e )=H(S)VE 



(11) 



Finding the secrecy capacity of a DSS is a hard problem 
and is still open in general, even for the class of homogeneous 
systems. Let °(a : /3, t) denote the secrecy capacity of a 
homogeneous DSS implementing symmetric repair and having 
t compromised nodes. Following the same steps in the proof 
of Th. [T] we can show that the secrecy capacity C s of a 
heterogeneous DSS cannot exceed that of a homogeneous DSS 
having the same average resources. 

Theorem 9: Consider a heterogeneous DSS with average 
storage capacity per node a, average repair bandwidth 7, and 
£ compromised nodes. The secrecy capacity of this system is 
upper bounded by 



C a <Cf°(a,7,*). 



(12) 



Equations ( fT2| ) and ([3j imply the following upper bound stated 
in Th. SI 



C s < min < a, {d — i + 1) ^ 



=£+1 



(13) 



Using Th. [9] we easily deduce that symmetric repair is 
also optimal in terms of maximizing the secrecy capacity of a 
compromised DSS. 

Corollary 10: Symmetric repair maximizes the secrecy ca- 
pacity of a homogeneous system with a given budget on total 
repair bandwidth. 

VI. Conclusion 

We have studied distributed storage systems that are het- 
erogeneous. Nodes in these systems can have different storage 
capacities and different repair bandwidths. We have focused 
on determining the information theoretic capacity of these 
systems, i.e., the maximum amount of information they can 
store, to achieve a required level of reliability (any k out of the 
n nodes should be able to give a stored file to a user). We have 
proved an upper bound on the capacity that depends on the 
average resources available per node. Moreover, we have given 
an expression for the system capacity when we know all the 
nodes' parameters. This expression may be hard to compute, 
but we use it to derive additional upper and lower bounds that 
are easy to evaluate. We have also studied the case in which the 
system is compromised by an eavesdropper, and have provided 



bounds on the system secrecy capacity under a perfect secrecy 
constraint. Our results imply that symmetric repair maximizes 
the capacity of a homogeneous system, which justifies the 
repair model used in the literature. Problems that remain open 
include finding an efficient algorithm to compute the capacity 
of a heterogeneous distributed storage system, as well as 
efficient code constructions. 

Appendix 
A. Functional vs. Exact Repair 

All of our results so far assumed a functional repair model. 
However, Theorems [T] [4] and [9] can be directly extended to the 
exact repair case. For instance, Th. [T] becomes: 

Theorem 11: The capacity C of a heterogeneous distributed 
storage system under exact repair, with node average capacity 
a and average repair bandwidth 7, is upper bounded by 



C<C^(a,7), 



(14) 

where C^ act (a,j) is the capacity of a homogeneous DSS 
under exact repair. 

In the proofs of Theorems [TJ [4] and [9] we construct a new 
"big" storage system using the original one as a building block. 
Hence, if we had exact repair in the original system to start 
with, we will have exact repair in the new "big" system. The 
results can thus be straightforwardly generalized to the case 
of exact repair. Moreover, under an exact repair constraint, a 
homogeneous DSS with symmetric repair maximizes capac- 
ity under given average node storage and repair bandwidth 
budgets. 

The other results, namely Theorems [3] [5j and[8j are proved 
using the analysis of the information flow graph. Therefore, It 
is not clear if there is an obvious extension of these results to 
the case of exact repair. 

B. Proof of Theorem [7] 

We prove Th. [T] by m aking formal the argument of the 
example in Section ]rV-A| We start by describing the operation 
of adding, or combining, together multiple storage systems 
having same number of nodes. Let VSS\^VSS 2 be two 
storage systems with nodes v\, . . . , v\ and v\, . . . , v^, respec- 
tively. The new system that we refer to as VSS obtained 
by combining VSS\ and VSS 2 is comprised of n nodes, 
say ui, . . . , u n . Node U{ has storage capacity on = a] + of 
(superscript j,j = 1,2, denotes a parameter of system Sj). 
Moreover, when node Uj fails in DSS, the new node down- 
loads Pijs = P}j S + Pfjs amount of information from helper 
node Ui (recall that S is the set of indices of the d helper 
nodes). We write VSS = VSS X + VSS 2 - 

Now, let VSS be the given heterogeneous system for which 
we wish to compute its capacity C. For each permutation a : 
{1, . . . , n} — >> {1, . . . , n}, we denote by VSS a the storage 
system with nodes v\ , . . . , v° such that vf = v a ^y Let V n 
denote the set of all n\ permutations on the set {1, . . . ,n}. 
We define a new "big" system by 



vss h 



vss a 



The new system VSS b is homogeneous with symmetric repair 
where the storage capacity per node a b is given by 



= (n — 1)! ^ Oii = n\a, 



1=1 



and the repair bandwidth per helper is given by 
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l3 b =(n-d-iy.(d-l)\J2J2 E Pas 

j=l i=l S 

\S\=d 

=(n- d -md-iy.±( n -% (15) 



i=i 



('< ,7 



d 



nl d- 



Therefore, the capacity Cb of VSSb as given by ^ is 



K _ 

C b = n! ^ min j a, (d - i + 1) ^ j . 



(16) 



Any scheme achieving the capacity C of the original system 
can be naturally extended to store a file of size n\C in VSS b 
(see Fig. [TJ. Therefore, C& > n\C. This inequality combined 
with fT6] ) gives the result of the Th. [T] 

C. Proof of Theorem |S| ( sketch ) 

We use the definition of the flow graph in (H to represent 
the DSS. The flow graph is a multicast network in which 
the multiple destinations correspond to the users requesting 
files from the DSS by contacting any k out of the n nodes. 
Therefore, the capacity of the DSS is the capacity of this 
multicast network which is equal to the minimum value of the 
min-cuts to the users, by the fundamental theorem of network 
coding. Note that in the flow graph, a storage node V{ is 
represented by two vertices x\ n and x l out connected by an 
edge of capacity (see Fig. [2]). 

Let C be the capacity of the DSS and define F to be 



mm 

fi^fj for i^j 



E 1 



/ 



\ 



V 



mm 

\Si\=d+l-i 
{f u ...Ji}nSi= 



J 



We want to show that C = F. 

Let . . . , fk) be fixed and consider the successive fail- 
ures and repairs of nodes Vf x , . . . , Vf n as seen in Fig. [2] 
Suppose node Vf x is repaired by contacting the helper nodes 
that minimize the sum /3s 1 with \Si \ = d and {fi} H Si =0, 
and node v f 2 is repaired by contacting node v f 1 and the d — 1 
helper nodes that minimize the sum f3s 2 with | S2 \ = d—1 and 
{A 7 /2} n ^2 = 0. We continue in this fashion and finish with 
node Vf k being repaired by contacting nodes Vf x , . . . , Vf k _ x 
and the d — k + 1 helper nodes that minimize f3s k with 
\Sk\ = d + 1 — k and {/1, . . . , fk} D Sk = 0- Now consider a 
user contacting nodes Vf x , . . . , Vf n there is a cut to the user of 
value F. By the max-flow min-cut theorem, we get C < F. 



To prove the other direction, consider a user in the system 
and let E denote the edges in the min-cut that separates this 
user from the source in the flow graph. Also, let V be the set 
of vertices in the flow graph that have a path to the user. Since 
the flow graph is acyclic, we have a topological ordering of 
the vertices in V, which means that they can be indexed such 
that an edge from V{ to Vj implies i < j. 

Let x\ ut be the first "out-node" in V (with respect to the 
ordering). If x\ n £ V, then x\ n x\ ut G E. On the other hand, 
if xj n G V, then the set of incoming edges 5i, |5i| = d, of 
x\ n must be in E. 

Now similarly let x 2 out be the second "out-node" in V with 
respect to the ordering. If x 2 n V, then x\ n x 2 out G E. If 
x 2 n G V, then the set £2, \S2 1 > d — 1, of edges incoming to 
x 2 n , not including a possible edge from x\ ut , must be in E. 
All k nodes adjacent to the user must be in V so continuing 
in the same fashion gives that the min-cut is at least 

k 

^mm(a /i5 /3s.), 

i=l 

where ft ^ fj for i^j, \Si\ = d + 1 - i, and {f 1: . . . , /*} n 
Si = 0. Hence C > F. 
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